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•'-H . Abstract 

We present a simulation algorithm for dynamical fermions that combines the 
multiboson technique with the Hybrid Monte Carlo algorithm. We find that the 
algorithm gives a substantial gain over the standard methods in practical simula- 
tions. We point out the ability of the algorithm to treat fermion zero modes in a 
clean and controllable manner. 



In this letter we want to present a new algorithm for simulations of dynamical fermions. Its 
basic conceptual idea is to separate out the low-lying eigenvalues of the Wilson-Dirac operator 
on the lattice and to not take this part of the spectrum into account for the generation of the 
gauge field configurations. However, the algorithm can be made exact by incorporating the 
low-lying eigenvalues into the observables, or, alternatively, by adding them via a reject/accept 
step. 

From a principle point of view, the separation of the eigenvalue spectrum into a high and 
low frequency part allows to monitor the low-lying eigenvalues. In particular, the algorithm 
offers the possibility to detect the appearance of eventual zero modes and to control their effects 
on physical observables. The low-lying eigenvalues are also expected to play an important role 
in practice as they slow down the fermion simulation algorithms, when approaching the chiral 
limit. Cutting these modes off, should therefore result in a gain for the cost of a practical 
simulation. 

The basic building blocks of the algorithm are the standard HMC algorithm [|I|] and the 
multiboson technique to simulate dynamical fermions |2|. A similar idea has been presented 
shortly in ||. In the multiboson technique, the inverse fermion matrix is approximated by a 
polynomial written in powers of the fermion matrix. We propose to take this polynomial to 
define the -approximate- interaction of the fermions. 

To be specific, let us consider the path integral for Wilson fermions on the lattice 

Z= fvUexp {-S g }det(Q 2 ) = fvUV^V(j)exp {-S g - ^Q~ 2 (j)} . (1) 

The term S g in the exponential is the pure gauge action and is given by 

S g = -j; y ETr{U P + rt P ) . (2) 

o P 

The symbol Up represents the usual plaquette term on the lattice with gauge links taken from 
SU(3). The determinant factor det(Q 2 ) accounts for the contribution of virtual fermion loops 
to the path integral. The bosonic fields carry spinor, flavour and colour indices. In eq.(|l]) 
and in the following we are assuming that we have two mass-degenerate flavours. The matrix 
Q that appears in the determinant is a hermitian sparse matrix defined by: 

Q(U)x,y = CoJ S [S X)V - K^2(l - J^Ux^Sx+^y + (I + TaO^J-M./A-AwI > ( 3 ) 

with k the so-called hopping parameter, related to the bare quark mass mo by k = (8 + 2mo) -1 , 
and c = [cm{1 + 8k)] -1 , where cm should be chosen such that the eigenvalues A of Q satisfy 
|A| < 1. 

Let us assume that we have constructed a polynomial P n of degree n such that 

det \Q 2 P n (Q 2 )] -> 1 for n -> oo , (4) 



with P n {\(Q 2 )) > for all the eigenvalues X(Q 2 ) of Q 2 in the range < \(Q 2 ) < 1. Then we 
can rewrite the determinant, 

d6tW > = det [*.«?)] ' (5) 

Each of the two determinants on the right-hand side can be represented as a Gaussian integral 
with the help of bosonic fields and 77, respectively. The partition function becomes 

Z = f ' VUVtfVQVtfVTJW exp {-S g - f P n (Q 2 )0 - 77*77} (6) 

where we introduced the "correction factor" 

W = exp jr/t (l - [Q 2 P n (Q 2 )} "*) r/| . (7) 

Note that eq.@ is an exact rewriting of the partition function eq. ([!]). 

With the introduction of the correction factor W the expectation value of an observable O 

is now computed as 

<OW> P 

where the averages < . . . >p are taken with respect to the measure defined through the ap- 
proximate fermion action 0* P n (Q 2 )(f) . Alternatively one may incorporate the W factor via a 
reject/accept step. 

Let us now specify the form of the polynomial that we are going to take. We choose a 
Chebyshev polynomial to approximate Q~ 2 . When written in its factorized form 



n n 



P[n,e]{Q 2 ) = C nU [Q 2 ~ Zk] =CnU [(Q ~ V^k*)(Q ~ yffi)] , (9) 

fc=l " fc=l 

it is characterized by its roots (for k — 1, 2, . . . , n), 

*k = 2 (1 + e ) " 2^ + e ) C0S (^l) ~ i>/isin( ^Tl ) (10) 

and a normalization factor c^, which is explicitly calculable 0. The polynomial P[ n)£ ](s) ap- 
proximates the function 1/s (where s may correspond to any of the eigenvalues of Q 2 ) uniformly 
in the interval e < s < 1. The relative fit error 

P M (s) = [P M (s) - 1/s] s (11) 

in this interval is exponentially small: 

\Rm(*)\ < 2 (t^)^ • ( 12 ) 

Let us finally introduce an accuracy parameter 5, which is actually an upper bound to the 
maximum relative error of the polynomial approximation, 

'- a te£T- (i3) 
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The parameter 5 provides an easily computable and conservative measure of how well the chosen 
polynomial approximates 1/s in the given interval e < s < 1. 

With the specification of the polynomial eq. (|j) the path integral eq. @ and the correction 
factor W are fully determined. It is clear that for polynomials of high degree, the interaction 
defined by them becomes too complicated for the application of local algorithms. It is therefore 
a natural choice to use molecular dynamics algorithms like the HMC or the Kramers equation 
algorithms 0,|| . In the following we will call our hybrid of molecular dynamics and multiboson 
algorithms the Polynomial Hybrid Monte Carlo (PHMC) algorithm. 

What do we expect from the PHMC algorithm? It has been suggested that the lowest 
eigenvalue of Q 2 , \ m i n (Q 2 ), is an important quantity in determining the cost of the HMC 
algorithm. In particular, a theoretical analysis leads to the -optimistic- estimate that the cost 
grows as l/A T7 ;^ ri (Q 2 ). In the PHMC algorithm, the role of the lowest eigenvalue is taken over 
by the infrared cut-off parameter e. For < s < e the polynomial P\ ni€ \(s) is always finite with 
values 0(1/ e). From the experience with the multiboson technique |7|, |S], |9|, |TD|, |TT| it has become 
clear that one might choose e to be substantially larger than \ m i n (Q 2 ) while still getting values 
for expectation values that are compatible with the ones obtained by the HMC algorithm. 
This result suggests that one might choose e > X m in(Q 2 ) also in the PHMC algorithm without 
introducing too large fluctuations of the correction factor. Since in the PHMC algorithm only 
one bosonic field is introduced, one will also avoid the dangerous increase of the autocorrelation 
time with increasing degree of the polynomial as found for the multiboson technique M. 

Before we turn to the results for the performance of the PHMC algorithm, let us shortly 
sketch, how the algorithm is implemented in our simulation program. We will be quite short 
here und refer to a forthcoming publication for more details and safety measures, in particular 
when using the algorithm on a 32-bit arithmetics machine. Let us start by discussing the 
heatbath for the bosonic fields 0, the action of which is given by 

S 6 = 0tp [n , e] (Q 2 )0. (14) 

To generate a Gaussian distribution according to this interaction, we proceed as follows. We first 
generate a Gaussian random vector (. We then solve Q 2 P[ n ^](Q 2 )X = Q 2 ( using a Conjugate 
Gradient (CG) method. By writing P[ n ,e](Q 2 ) = Pn/2(Q 2 )^n/2(Q 2 ) with appropriate ordering 
of the roots, we finally construct the 0-fields via = P*, 2 (Q 2 )X. 

The derivation of the force for the PHMC algorithm is done in complete analogy to the 
method used for the HMC algorithm [Tj|. A variation of the action eq.flUP using the polynomial 



eq.([|) reveals that one has to construct the vectors 

<f>i=H[Q-V^]<!>o ;j = l,...,2n-l (15) 



fc=i 



with 0o the bosonic field generated by the boson heat bath. The vectors 0j are precalculated 
and stored. This calculation may be organized in such a way that the memory storage required 



amounts to only n + 1 (instead of 2n) vectors <fij. One may then use them in the actual force 
computation at the appropriate places. 

The above storage requirement for the vectors <pj may be further reduced by introducing 
more bosonic field copies. For example, one may split the polynomial into two parts Pl/l and 



P^i satisfying det 



P\nAQ 2 



det 



'p(i) 

-Si/2 



(Q 2 



det 



n/2 

Pn/ 2 {Q 2 ) an d integrate each contribution 



separately, leading to the action 



s b = 4Pi%(Q 2 )<Pi + 4Pi%(Q 2 )<h 



(16) 



It is amusing to note that by iterating this procedure one can obtain an interpolation between 
a nearly exact HMC algorithm, if n and 1/e are large enough, and the multiboson technique to 
simulate dynamical fermions. Although we did not yet perform an extended analysis, our first 
results indicate that introducing a few bosonic field copies does not increase the autocorrelation 
time, when the number of copies is held small, say less than about eight. It remains a subject 
of further study, however, whether the molecular dynamics behaviour is severely altered by the 
introduction of more field copies. 

Finally, the computation of the correction factor W eq.(|7|) needs an additional inversion of 
Q 2 P[n,e]{Q 2 ) ■ Since the 77-field occurring in W is completely independent from the 0-field in the 
boson heatbath, this inversion has to be done separately. 

We decided to test the PHMC against the HMC algorithm on 4 4 and 8 4 lattices. All 
numerical results have been obtained on Alenia Quadrics (APE) massively parallel computers. 
We adopted Schrodinger functional boundary conditions. For the 4 4 lattice we ran at /3 — 
6.4, k = 0.15 and for the 8 4 lattice we had (3 = 5.6 and k = 0.1585 ~ k c [|i~3 |. In both, 
the HMC and the PHMC algorithms, even-odd preconditioning [Q and a Sexton- Weingarten 



leap-frog integration scheme juj] is implemented. We want to emphasize that most of the 
improvements to accelerate the HMC algorithm can be taken over to the PHMC algorithm. 
We think, therefore, that the results of the comparison we are performing here should be 
independent of the particular implementation. 



Table 1: Technical parameters for both algorithms 





HMC 


PHMC 


Lattice 


tmd 




Nmd 


^md 


N md 


e 


n 


cm 


44 
8 4 


0.25 
0.075 




4 
13 


0.25 
0.09 


4 
10 


0.036 
0.0026 


12 
48 


0.5789 
0.5789 



In table 1, we give the parameters of the algorithms which are the step size e m d and the 
number of molecular dynamics steps N m d as used for the leap frog integration. We also give 
the parameters characterizing the polynomial. The parameters were tuned in such a way that 
about the same acceptance rate was achieved in both algorithms, namely 82% and 86% for the 



HMC and PHMC algorithms, respectively, on the 4 4 and, correspondingly, 80% and 79% for 
the 8 4 lattices. 

With the choice of cm in table [L] we found the value of the largest eigenvalue \ m ax(Q 2 ) °f 
the preconditioned matrix Q to be close to 1. As first noted in fll9" |, since the polynomial is 
constructed such that it provides a uniform approximation of Q~ 2 for e < A < 1, lifting the 
eigenvalues by choosing cm < 1, allows to choose a larger value of e and therefore a polynomial 
of lower degree in order to achieve a desired value for the accuracy parameter S. 

We give in table ^] results for the expectation values of the plaquette < P > and the lowest 
eigenvalue < \ m i n (Q 2 ) > ■ We also give the uncorrected expectation values (setting W = 1 in 
eq.(0)) denoted by a *. In the third column we give the number of trajectories that were taken 
for the analysis. 



Table 2: Results for both algorithms 



Lattice 


Algorithm 


# trajectories 


(P) 


\^min\^C )) 


4 4 


HMC 


18000 


0.66179(13) 


0.01582(9) 




PHMC 


18000 


0.66169(16) 


0.01570(10) 




PHMC* 


18000 


0.66248(13) 


0.01324(7) 


8 4 


HMC 


2745 


0.57251(12) 


0.001310(51) 




PHMC 


2560 


0.57253(16) 


0.001328(51) 




PHMC* 


2560 


0.57272(14) 


0.001141(51) 



First of all, table |2] confirms the correctness of the PHMC algorithm. While on the 4 4 lattice 
the correction factor is important, one notices that for the 8 4 lattice it only has a small effect. 
A crucial question is, whether the correction factor introduces strong fluctuations that may 
lead to large errors for the corrected observables. We find that this is not the case, when we 
arrange for a situation where e is 2-3 times larger than the lowest eigenvalue of the problem 
and the relative fit error of the polynomial is kept small enough, 5 ~ 0.02. For larger values of 
5 the fluctuations can become substantial, leading to large errors for the corrected observables 

eq.(§. 

In addition, the fluctuations of the corrected observables can be suppressed further by 
choosing the number of updates of the //-fields to be larger than the number of full gauge field 
updates. This amounts to compute the correction factor N corr times on the same gauge field 
configuration. In our test, presented here, we have chosen N corr = 1 on the 4 4 lattice and 
N C orr = 2 on the 8 lattice. 

Since the behaviour of the observables from the HMC and the corrected ones eq. (§) from 
the PHMC algorithm may in principle be very different, it is important to find an estimate for 
the true error in order to be able to compare both algorithms. To this end, we used a jack-knife 
binning procedure, looking for a plateau in the blocked errors. For the HMC algorithm, as a 
consistency check, we determined also the integrated autocorrelation time computed directly 
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Figure 1: The blocked jack-knife errors for the lowest eigenvalue 
Amm(Q 2 ) from the HMC algorithm (filled squares). Nb is the bin- 
ning block length. The dashed line ist the estimate for the blocked 
error from the integrated autocorrelation time in the HMC algorithm. 
The shaded region is the estimate for the true error from the PHMC 
algorithm. 



from the autocorrelation function. The result is illustrated in fig. |l| for the case of the lowest 
eigenvalue of Q 2 on the 8 4 lattice. The filled squares are the blocked errors A(A m j n (Q 2 )) as a 
function of the block length iVj, as obtained from the HMC algorithm. iV& = 1 corresponds to 
the naive error A naive , JVj, = 2 to blocking two consecutive measurements and so on. The dashed 
line indicates the true value of the error as computed from the integrated autocorrelation time 
r, A 



true 



2rA T 



For the PHMC algorithm on the 8 4 lattice we ran on the QH2 version of the APE machine 
with 256 nodes. Distributing the 8 4 lattice on 8 of these nodes, gives us 32 independent 
systems, from which the error can be evaluated straightforwardly. One may also build from 
these 32 systems 2 groups, each consisting of 16 independent systems and giving a separate 
error estimate Ai and A 2 . We take the difference between Ai and A 2 as an estimate of the 
"error of the error" . We plot this uncertainty of the error as the shaded region in fig. [I]. The 
same analysis can be made for the plaquette with a similar result. 



We conclude that with the same number of trajectories both algorithms give compatible 
error bars for the plaquette and the lowest eigenvalue. Note, however, that with our statistics 
the error on the error is still significant. This is, of course, just a reflection of the uncertainty 
in the determination of the autocorrelation times. 

Let us discuss now the cost of a single trajectory in both algorithms. We write the total 
cost for the algorithms as 

Ctot — Cq^ + C extra , (17) 

where the first contribution is given by the number of matrix times vector Qcj) operations and 
the second part accounts for all other operations. Asymptotically, when the condition number 
of Q becomes large, Cq^ will by far dominate the cost of the algorithms. We will therefore 
only discuss and compare the cost Cq^ in the following. Let us remark, however, that for small 
condition numbers C ex t ra can be a non-negligible part of the total cost, in particular for the 
HMC algorithm as one might deduce from the details of the algorithm structure. 

Let us denote by Nqg the average number of iterations of the Conjugate Gradient algorithm 
that is implemented in our programs for all matrix inversions. Then the cost for the HMC 
algorithm in units of Qcj) operations is given by 

Cq^HMC) = 2 • (2iV md + 1) • N CG , (18) 

where the first factor of 2 stems from the fact that one needs 2 Qcj) operations in each iteration 
of the CG routine. The factor (2N m d + 1) originates from the use of the Sexton- Weingarten 
integration scheme |l]| . The cost for the PHMC algorithm is split into three parts, 



Cq((,(PHMC) — Cbhb + Cupdate + Ccorr , (19) 

where Cbhb is the cost for the heatbath of the bosonic fields, C up d a te the cost for the force 
computation and C corr the cost to evaluate the correction factor. In units of Qep operations we 
find 

C bhb = 2n-Ng£ + n 

C U pdate = 3n • {2N m d + 1) 

The factor N corr denotes as above the number of evaluations of the correction factor W per 
full gauge field update. The factor 3n in C up d a te comes for the following reason. One needs 
basically 2n Qcj) operations to construct the fields 4>j of eq . (|i5|) . The computation of the total 
force needs a loop over the number of fields, n. In each iteration of this loop one has to 
compute the variation of the action with respect to the gauge fields for all four directions. 
This computation corresponds roughly to one Qcj) multiplication. We explicitly verified this 
expectation for our implementation of the PHMC algorithm on the APE computer. We expect, 
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however, the formula for C up( i a te in the PHMC algorithm to also hold for other situations like 
improved Wilson fermions. 

We give the cost of both algorithms in table [3| We see that on the 8 4 lattice, we win about 
a factor of 1.8 against the HMC algorithm. Let us make a few remarks at this point. 

Table 3: Cost for both algorithms 



Lattice Algorithm C bhb C upda te C corr C, 



Q<!> 



4 4 PHMC 130 324 86 540 

HMC 868 868 

8 4 PHMC 350 3024 600 3974 

HMC — 7398 — 7398 

(Correction factor) We find for our PHMC algorithm that the plaquette has an autocorrela- 
tion time of two while for the lowest eigenvalue the autocorrelation time comes out to be about 
four. In this situation it is not necessary to calculate the correction factor for every trajectory. 
Indeed, the correction factor should be more considered as part of the measurement and, from 
this point of view, its cost should be added to the measurement costs. For observables, for 
which the measurements are time consuming, or for situations where not every trajectory is 
measured, the cost of the correction factor becomes negligible. 

(Long trajectories) If we go to larger lattices and situations with larger condition numbers 
than considered here, the number of steps per trajectory, N m d, is increased when the trajectory 
length is kept fixed. We therefore expect that again the overheads for the correction factor and 
also for the boson heatbath will become negligible. From the above numbers, we conclude that 
the cost for the update itself, C up d a te, is reduced in the PHMC algorithm by more than a factor 
of two as compared to standard HMC. 

(Parallelization) When using massively parallel architectures, it is often advantageous to 
simulate several lattices simultaneously. If one uses the HMC algorithm, on SIMD architectures 
all the systems have to wait until the system with the largest number of CG iterations has 
converged. If one compares the number of CG iterations from running 32 replicas in parallel, 
iVcc (32 systems), to the one from running only a single system, Ncg{^- system), one finds that 
the ratio iVcG (32 systems )/Ngg(1 system) may easily reach values of about 2. In the PHMC 
algorithm, at least in the most time consuming part, the number of Q<p operations is, however, 
fixed by the degree of the polynomial and the same for each system. We expect therefore for 
this situation the PHMC algorithm to give an additional gain. Let us emphasize that, of course, 
all of the costs of the algorithms given in table 3 refer to the case of running a single system. 

In conclusion, we have presented a new algorithm, called the PHMC algorithm which is a 
hybrid of the standard HMC algorithm and the multiboson technique to simulate dynamical 
fermions. Within the uncertainty of the error determination, shown in fig. [I], we find that for 
the same number of trajectories, the errors from the HMC and the PHMC algorithms are about 



equal. At the same time, the cost of generating a single trajectory is reduced by almost a factor 
of 2 when using the PHMC algorithm. Certainly, the properties of the PHMC algorithm have 
to be investigated more, different observables should be considered and, of course, improved 
fermions should be studied. However, we find our results very promising to finally find a real 
gain of about a factor of 2 over the standard HM C algorithm. 

As a rule of thumb we advise to choose the lowest end of the fit range, e, two or three times 
larger than the lowest eigenvalue of Q 2 and the degree n of the fitting polynomial such that the 
accuracy parameter 5, introduced in eq.(|T3|), is between 0.01 and 0.02. For too large values of 
S, the fluctuations of the corrected observables eq.(||) become too large. For too small values, 
the degree of the polynomial increases too much. 

Even more than the practical gain that we anticipate, we think that the PHMC algorithm has 
an advantage which is of principle nature. It has been demonstrated that for Wilson fermions, 
with and without Symanzik improvement, fermionic (almost) zero modes may appear in the 
quenched approximation []TB], |T7|]. Such modes distort the statistical sample substantially. On 



the other hand, as discussed in [|I7|], the full path integral is finite and the fermion zero modes 
are cancelled by the measure. 

The way the standard fermion simulation algorithms deal with the zero modes, leaves us 
with a dilemma. Either these algorithms suppress the zero modes so strongly that in practical 
simulations configurations carrying (almost) zero modes do not occur at all. But then we do 
not know what their importance is on physical observables, which is unfortunate in particular 
within the context of topology. Or, on the other hand, a few configurations with (almost) 
fermion zero modes are actually generated. But then they will lead to exceptional values for 
quark propagators and a reliable measurement of the observables involving them will become 
very difficult. 

In our PHMC algorithm, the update part is safe against the zero modes, since the infrared 
cut-off parameter e leaves the polynomial always finite. One may, however, monitor the lowest 
eigenvalue and its eigenvector during a simulation by using minimization techniques like the one 
described in |18| . If an isolated zero mode is detected, one may switch from the computation 



of the correction factor discussed above to the following strategy. What we want to compute is 



det 



Q 2 P{Q 2 ) = det[A] . (21) 



Since we know the lowest eigenvalue \ min (A) = \ m i n (Q 2 )P(^min(Q 2 )) and its eigenvector x, we 
may define a projector P x that projects onto the subspace orthogonal to x leading to a matrix, 
where the lowest eigenvalue is taken out, 

A = A — A m i n {A)P x . {^^) 

Now, it is not difficult to show that 

det [A] = X min (A)det [A] , (23) 
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where the factor del A may again be evaluated with the help of Gaussian bosonic fields f\. 

For pure gauge observables, like Wilson loops, we find therefore that the configurations 
carrying zero modes have a negligible weight in eq.(|8[). For a fermionic observable involv- 
ing quark propagators, the situation is different because in the numerator of eq.(||) the zero 
mode configurations may give a finite, non-vanishing contribution, while in the denominator 
these configurations do not contribute. In this case the strategy will be to again separate out 
the leading divergent contribution to the observable, which, when considering two degenerate 
quark flavours, may be at most proportional to A~* n (Q 2 ). Technically, this can be achieved 
again by projecting the lowest eigenvalue out. The divergence possibly appearing in the lead- 
ing contribution will now be cancelled by the infinitesimal factor (proportional to X m in{Q 2 )) 
from the correction factor yielding, as expected, a finite, non-vanishing, well defined result. 
The non-leading contributions to the fermionic observable (i.e. the ones less divergent than 
KniniQ 2 )) ma y a l so be evaluated, by basically inverting the matrix Q 2 — A m j n (Q 2 )-P x , which is 
now well conditioned. Note that these contributions are suppressed by the correction factor as 

^min\Q ) — > 0. 

The above discussion may be generalized to a situation where a number of eigenvalues 
assume very small values. We therefore find that our PHMC algorithm is in principle able 
to take eventual zero modes into account in a controllable way when performing dynamical 
fermion simulations. 
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